Using Explicit Control Processes in Distributed Workflows to Gather Provenance
نویسندگان
چکیده
Distributing workflow tasks among high performance environments involves local processing and remote execution on clusters and grids. This distribution often needs interoperation between heterogeneous workflow definition languages and their corresponding execution machines. A centralized Workflow Management System (WfMS) can be locally controlling the execution of a workflow that needs a grid WfMS to execute a sub-workflow that requires high performance. Workflow specification languages often provide different control-flow execution structures. Moving from one environment to another requires mappings between these languages. Due to heterogeneity, control-flow structures, available in one system, may not be supported in another. In these heterogeneous distributed environments, provenance gathering becomes also heterogeneous. This work presents controlflow modules that aim to be independent from WfMS. By inserting these control-flow modules on the workflow specification, the workflow execution control becomes less dependent of heterogeneous workflow execution engines. In addition, they can be used to gather provenance data both from local and remote execution, thus allowing the same provenance registration on both environments independent of the heterogeneous WfMS. The proposed modules extend the ordinary workflow tasks by providing dynamic behavioral execution control. They were implemented in the VisTrails graphical workflow enactment engine, which offers a flexible infrastructure for provenance gathering.
منابع مشابه
On the Use of Abstract Workflows to Capture Scientific Process Provenance
Capturing provenance about artifacts produced by distributed scientific processes is a challenging task. For example, one approach to facilitate the execution of a scientific process in distributed environments is to break down the process into components and to create workflow specifications to orchestrate the execution of these components. However, capturing provenance in such an environment,...
متن کاملA Provenance-Integration Framework for Distributed Workflows in Grid Environments
Provenance information about complex and distributed workflows is a key issue for data quality control and data reliability maintenance in reservoir management. Distributed and integrated environments where different workflows consume and transform data require a comprehensive provenance view. In this scenario provenance collection and integration presents significant challenges. In this paper,...
متن کاملAn Identity Crisis in the Life Sciences
Grid is an e-Science project assisting life scientists to build workflows that gather and co-ordinate data from distributed, autonomous, replicated and heterogeneous resources. The provenance logs of workflow executions are recorded as RDF graphs. The log of one workflow run is used to trace the history of its execution process; however, by aggregating provenance logs of workflow reruns, or run...
متن کاملAtomicity and provenance support for pipelined scientific workflows
Today many significant scientific discoveries are achieved through complex and distributed scientific computations that are structured and represented as scientific workflows. Although atomicity is a well studied topic in transaction processing and business workflows, such an important capability needs to be revisited in a scientific workflow environment. Firstly, the semantics of atomicity nee...
متن کاملMonitoring of Grid scientific workflows
Scientific workflows are a means of conducting in silico experiments in modern computing infrastructures for e-Science, often built on top of Grids. Monitoring of Grid scientific workflows is essential not only for performance analysis but also to collect provenance data and gather feedback useful in future decisions, e.g., related to optimization of resource usage. In this paper, basic problem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008